Sample Efficient Policy Search for Optimal Stopping Domains
نویسندگان
چکیده
Optimal stopping problems consider the question of deciding when to stop an observation-generating process in order to maximize a return. We examine the problem of simultaneously learning and planning in such domains, when data is collected directly from the environment. We propose GFSE, a simple and flexible model-free policy search method that reuses data for sample efficiency by leveraging problem structure. We bound the sample complexity of our approach to guarantee uniform convergence of policy value estimates, tightening existing PAC bounds to achieve logarithmic dependence on horizon length for our setting. We also examine the benefit of our method against prevalent model-based and model-free approaches on 3 domains taken from diverse fields.
منابع مشابه
Sample Efficient Bayesian Optimization for Policy Search: Case Studies in Robotics and Education
In this work we investigate the problem of learning adaptive strategies, called policies, in domains where evaluating different policies is costly. We formalize the problem as direct policy search: searching the space of policy parameters to identify policies that perform well with respect to a given objective. Bayesian Optimization is one method suitable for such settings, when sample/data eff...
متن کاملOptimal Stopping Policy for Multivariate Sequences a Generalized Best Choice Problem
In the classical versions of “Best Choice Problem”, the sequence of offers is a random sample from a single known distribution. We present an extension of this problem in which the sequential offers are random variables but from multiple independent distributions. Each distribution function represents a class of investment or offers. Offers appear without any specified order. The objective is...
متن کاملOptimal Placement and Sizing of TCSC & SVC for Improvement Power System Operation using Crow Search Algorithm
Abstract: The need for more efficient power systems has prompted the use of a new technologies includes Flexible AC transmission system (FACTS) devices. FACTS devices provides new opportunity for controlling the line power flow and minimizing losses while maintaining the bus voltages within a permissible limit. In this thesis a new method is proposed for optimal placement and sizing of Thyristo...
متن کاملOptimal design for multi-arm multi-stage clinical trials
In early stages of drug development there is often uncertainty about the most promising among a set of different treatments. In order to ensure the best use of resources it is important to decide which, if any, of the treatments should be taken forward for further testing. Multi-arm multi-stage (MAMS) trials provide gains in efficiency over separate randomised trials of each treatment. They all...
متن کاملOptimal Search and Stop in Continuous Search Process
This paper investigates an optimal search policy with stopping for a stationary target being in one of n boxes. It is assumed that the search is conducted continuously with a total search cost C per unit time and the search in box i costs ci per unit search effort. The conditional probability of detecting the target with unit search effort is Oli and a reward Ri is given to the searcher when he...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017